sns.set_style('darkgrid')
sns.set_palette('colorblind')
G = graphs.Community(N=300, Nc=7)
print(f'{G.N} nodes, {G.Ne} edges')
print(G.N)
print(G.Ne)
print(G.is_connected())
print(G.is_directed())
G.plot()
G2 = graphs.Graph(G.W)
G2.set_coordinates(kind='spring') #here with a spring based algorithm
G2.plot()
G2 = graphs.Graph(G.W)
G2.set_coordinates(kind='ring2D') #here with a spring based algorithm
G2.plot()
v_in, v_out, weights = G.get_edge_list()
print(v_in.shape, v_out.shape, weights.shape)
graphs can also be constructed from a set of points in $\mathbb{R}^d$.
Let $\mathbf{X}$ be a data matrix $\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_N]^\intercal \in \mathbb{R}^{N \times d}$, where each $\mathbf{x} \in \mathbb{R}^d$ is a sample for which we have $d$ features.
We may separate the task of building a graph from a set of points in two sub-tasks:
n_points = 300
dimensionality = 3
points = np.random.uniform(size=(n_points, dimensionality))
plt.scatter(points[:, 0], points[:, 1]);
sigma=0.2
G = graphs.NNGraph(points, NNtype='knn', k=2, sigma = sigma**2, rescale=False, symmetrize_type='maximum')
print(f'{G.N} nodes, {G.Ne} edges')
G.plot()
G = graphs.NNGraph(points, NNtype='radius', epsilon=0.1, sigma = sigma**2, rescale=False, symmetrize_type='maximum')
print(f'{G.N} nodes, {G.Ne} edges')
G.plot()
We consider here only undirected graphs, such that the Laplacian matrix is real symmetric, thus diagonalizable in an orthonormal eigenbasis $$\mathbf{L}=\mathbf{U}\mathbf{\Lambda U}^\top,$$ where $\mathbf{U}=(\mathbf{u}_1|\ldots|\mathbf{u}_N)\in\mathbb{R}^{N\times N}$ is the matrix of orthonormal eigenvectors and $\mathbf{\Lambda} = \text{diag}(\lambda_1,\ldots,\lambda_N)$ is the diagonal matrix of associated sorted eigenvalues: $$\lambda_1\leq\lambda_2\leq\ldots\leq\lambda_N.$$
Note:
G = graphs.Community(N=400, Nc=7)
G.plot()
eig_val, U = sp.linalg.eigh(G.L.toarray())
#eig_val, U = sp.sparse.linalg.eigsh(G.L, k=10, which='SM') # used for SBM
#eig_val, U = sp.sparse.linalg.eigsh(G.L, k=10, which='SM')
Use sp.linalg.eigh to diagonalize $\mathbf{L}$ because it is the eigensolver optimized for symmetric matrices. Also, sp.linalg.eigh cannot take sparse matrices as entries. For sparse matrices, optimized eigensolvers exist to obtain the first few eigenvalues and eigenvectors, such as
eig_val, U = sp.sparse.linalg.eigsh(G.L, k=10, which='SM')
Optionwhich='SM' for smallest magnitude search, and which='LM' for largest magnitude search.
eigsh is nevertheless only made to compute only a few eigenvectors. You will quickly see the utility of sp.sparse.linalg.eigsh(G.L, k=10, which='SM') versus sp.linalg.eigh(G.L.toarray()) if you increase the size of the network to $10^4$ nodes or larger for instance.
np.finfo(float).resolution
# G_nx = nx.from_numpy_matrix(G.A.toarray())
# sorted(nx.connected_components(G_nx),key = len, reverse=True)
plt.plot(eig_val, '+-')
The Stochastic Block Model (SBM) is a latent variable model of structured random graphs.
Relaxation: Suppose that all intra-block probabilities are set to $p_{\text{in}}$ and all inter-block probabilities are set to $p_{\text{out}}$. Also, we will consider only balanced SBMs (ie SBMs whose blocks have same size).
In the case of balanced SBMs, instead of the pair of parameters $(p_{\text{in}}, p_{\text{out}})$, consider the pair $(c,\epsilon)$ where $c$ is the average degree of the graph and $\epsilon = \frac{p_{\text{out}}}{p_{\text{in}}}$ is a measure of fuzziness: the larger $\epsilon$, the fuzzier the community structure. The relations between both pairs of parameters are: $$p_{\text{in}} = \frac{kc}{N-k+(k-1)\epsilon N}~~~\text{ and }~~~p_{\text{out}} = \frac{kc\epsilon}{N-k+(k-1)\epsilon N}$$
NOTE: for a fixed triplet $(\epsilon, N, k)$, $c$ is not a free parameter.
The classical inference SBM problem is: given an instance of an SBM $G$, infer the blocks.
In the balanced (ie: all blocks have the same size) and sparse (ie average degree $c=\mathcal{O}(1)$) case, the following phase transition has been shown (Decelle et al., Massoulié, Abbe et al.). For a given pair $(k, c)$, there exists a critical fuzziness $\epsilon_c$ such that as $N\rightarrow \infty$:
with $\epsilon_c = \frac{c - \sqrt{c}}{c + \sqrt{c} (k-1)}$
N = 1000 # number of nodes
k = 5 # number of blocks
c = 20 # average degree
epsi_c = (c - np.sqrt(c)) / (c + np.sqrt(c) * (k-1)); # critical fuzziness
epsi = epsi_c / 50 # this is a very strong block structure
pin = (k * c) / (N - k + (k - 1) * epsi * N)
pout = (k * c * epsi) / (N - k + (k - 1) * epsi * N)
z = np.tile(np.arange(k), (np.int(N/k),1))
truth = np.reshape(z.T, (N,1))[:,0] # the ground truth community assignment truth[i] is the community number of node i
G = graphs.StochasticBlockModel(N=N, k=k, z=truth, p=pin, q=pout)
G.compute_laplacian('normalized')
fig, axes = plt.subplots(1, 2)
axes[0].spy(G.L, markersize=0.6)
axes[1].hist(G.L.data, bins=100, log=True);
plt.hist(G.d)
plt.title('Degree distribution of my random graph');
np.int(N/k),1
np.arange(k)
#z = np.tile(np.arange(k), (np.int(N/k),1))
(N,1)
#z.T
#np.reshape(z.T, (N,1))
With epsi = epsi_c / 50, ie, a very strong block structure, blocks appear even 'visually' with spring-based plotting algorithms:
G.set_coordinates(kind='spring') #here with a spring based algorithm
G.plot()
G.compute_fourier_basis()
eig_val, U = G.e, G.U
plt.plot(eig_val[:10], '+-')
The large gap between $\lambda_k$ and $\lambda_{k+1}$ is called the "spectral gap": it heuristically appears when there is a strong structure in $k+1$ communities (even though there exist some theoretical justifications)
Hence,there is a strong structure in $5$ communities
#U
len(U)
len(U[0])
#U[:,0]
len(U[:,0])
G.plot_signal(U[:,0])
G.plot_signal(U[:,1])
G.plot_signal(U[:,2])
G.plot_signal(U[:,3])
G.plot_signal(U[:,4])
G.plot_signal(U[:,5])
G.plot_signal(U[:,6])
Why is the 5th eigenvector so localized?
NOTE that: $$\lambda_k=\mathbf{u}_k^\top\mathbf{L}\mathbf{u}_k=\sum_{i\sim j}\mathbf{W}_{ij} (u_k(i)-u_k(j))^2,$$ such that eigenvectors associated to low eigenvalues tend to be smooth with respect to any path in the network. In block-structured graphs, this usually means quasi-constant within each block. Also: $$\mathbf{u}_2= \text{argmin}_{\mathbf{u}\in\mathbb{R}^N \text{ s.t. } \mathbf{u}_1^\top\mathbf{u}=0 \text{ and } \mathbf{u}^\top\mathbf{u}=1} ~~~\mathbf{u}^\top\mathbf{L}\mathbf{u}.$$ Because $\mathbf{u}_1=\frac{1}{\sqrt{N}}\mathbf{1}$ where $\mathbf{1}$ is the constant vector equal to $1$ (this is true in the case of the combinatorial Laplacian), this equation is equivalent to: $$\mathbf{u}_2= \text{argmin}_{\mathbf{u}\in\mathbb{R}^N \text{s.t.} \sum_i u_2(i) = 0 \text{ and }\sum_i u_2(i)^2 = 1} ~~~\mathbf{u}^\top\mathbf{L}\mathbf{u}.$$ In words: $\mathbf{u}_2$ is the normalized vector that minimizes local variation AND that has zero average. Similarly, $\mathbf{u}_3$ is the normalized vector that minimizes local variation AND that is orthogonal both to $\mathbf{u}_1$ and $\mathbf{u}_2$. Etc.
Spectral clustering takes advantage of this property. Let us look at a 3-block structure, for better illustration:
Say we want to recover in which block belongs each node. The first eigenvector's information is useless (as long as we use the combinatorial Laplacian anyways) as it is constant. A solution is to plot each node $i$ in 2D with coordinates $(u_2(i), u_4(i))$:
plt.figure()
plt.scatter(U[:,1], U[:,4])
plt.figure()
plt.scatter(U[:,2],U[:,3])
plt.figure()
plt.scatter(U[:,3],U[:,4])
plt.figure()
plt.scatter(U[:,3],U[:,5])
U2 and U4 provides a better representation of each cluster
Look at plt.scatter(U[:,1], U[:,4], c=G.truth): the nodes in the same block are in the same 'blob' in this 2D plot! This observation is the basis of spectral-clustering-type algorithms.
TODO
In the last example, performing $k$-means in the $2D$ plane defined by $\mathbf{u}_1$ and $\mathbf{u}_4$ recovers the 5 block structures (the ground truth is in G.truth). To measure performance,I use the function AR_index which computes the Adjusted Rand index (both functions were already imported in the preamble)
Plot the average performance of community mining with this spectral clustering algorithm on balanced SBMs versus the fuziness $\epsilon$.
Read "A tutorial on spectral clustering" by U. von Luxburg to learn among other things that a degree-corrected version is in general preferable (using the normalized Laplacian instead of the combinatorial Lap for instance). For arguments to choose the "best" degree-correction, see 'Improved spectral community detection in large heterogeneous networks' by Tiomoko Ali and Couillet.
communities = [100, 150, 80]
G = graphs.Community(N=330, Nc=3, comm_sizes=communities, seed=1)
fig, axes = plt.subplots(1, 2)
axes[0].spy(G.W, markersize=0.5)
G.set_coordinates('community2D')
G.plot(ax=axes[1])
Use the combinatorial Laplacian $L = D - W$. OR the normalized Laplacian $\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{W} \mathbf{D}^{-1/2}$.
G.compute_laplacian('normalized')
fig, axes = plt.subplots(1, 2)
axes[0].spy(G.L, markersize=0.6)
axes[1].hist(G.L.data, bins=50, log=True);
sns.distplot(G.d, 50)
A graph signal is a function $\mathcal{V} \rightarrow \mathbb{R}$ that associates a value to each node $v \in \mathcal{V}$ of a graph. The signal values can be represented as a vector $f \in \mathbb{R}^N$ where $N = |\mathcal{V}|$ is the number of nodes in the graph.
signal = np.random.normal(size=G.N)
G.plot_signal(signal)
The gradient $\nabla_\mathcal{G} \ f$ of the signal $f$ on the graph $\mathcal{G}$ is a signal on the edges defined as
$$(\nabla_\mathcal{G})_{(i,j)} \ f = \sqrt{W_{ij}} (f_i - f_j)$$
The differential operator $\mathbf{D} \in \mathbb{R}^{|\mathcal{E}| \times N}$ is defined such as $\mathbf{L} = \mathbf{D}^\intercal \mathbf{D}$. It can be computed with the PyGSP:
G.compute_differential_operator()
print(G.D.shape)
G.D
gradient = G.D @ signal
assert gradient.size == G.Ne
gradient
The gradient of a signal $\mathbf{x}$ is then given by $$\dot{\mathbf{x}} = \nabla_{G} \, \mathbf{x} = \mathbf{D} \mathbf{x} \in \mathbb{R}^{|\mathcal{E}|}.$$
Its value on the edge $(i,j) \in \mathcal{E}$ is given by $$\dot{\mathbf{x}}[(i,j)] = \sqrt{\mathbf{W}[i,j]} \, (\mathbf{x}[i] - \mathbf{x}[j]).$$
Similarly, we can compute the divergence of an edge signal, which is again a signal on the nodes.
$$(\operatorname{div}_\mathcal{G} x)_i = \sum_{j \sim i} \sqrt{W_{ij}} x_{(i,j)}$$
Similarly, the divergence of $\dot{\mathbf{x}}$ is given by $$\operatorname{div} \dot{\mathbf{x}} = \mathbf{D}^\intercal \dot{\mathbf{x}} = \mathbf{D}^\intercal \mathbf{D} \mathbf{x} = \mathbf{L} \mathbf{x} \in \mathbb{R}^N.$$
It is a graph signal which value at node $i \in \mathcal{V}$ is given by $$ \begin{align*} (\operatorname{div} \dot{\mathbf{x}}) [i] &= \sum_{i \sim j} \sqrt{\mathbf{W}[i,j]} \, \dot{\mathbf{x}}[(i,j)] \\ &= \sum_{i \sim j} \mathbf{W}[i,j] \, (\mathbf{x}[i] - \mathbf{x}[j]) \\ &= (\mathbf{L} \mathbf{x})[i]. \end{align*} $$
Note that the above derivations are for the combinatorial Laplacian. For the normalized Laplacian, replace $\W$ by $\mathbf{D}^{-1/2} \mathbf{W} \mathbf{D}^{-1/2}$. As its name implies, the normalized Laplacian is simply a normalization of the edge weights.
divergence = G.D.T @ gradient
assert divergence.size == G.N
np.linalg.norm(G.L @ x - divergence)
G.plot_signal(divergence)
The Laplacian operator is indeed the divergence of the gradient
In the PyGSP, the gradient and divergence can be computed more efficiently with G.grad() and G.div().
# np.testing.assert_allclose(gradient, G.grad(x))
# np.testing.assert_allclose(divergence, G.div(gradient))
The smoothness of a signal can be computed by the quadratic form
$$ f^\intercal L f = \sum_{i \sim j} W_{ij} (f_i - f_j)^2 $$
Is our random signal smooth? Our intuition certainly says no. Let's verify by computing the norm of the gradient: $$\| \nabla_\mathcal{G} \, \mathbf{x} \|_2^2 = \langle \mathbf{D} \mathbf{x}, \mathbf{D} \mathbf{x} \rangle = \mathbf{x}^\intercal \mathbf{L} \mathbf{x} = \sum_{i \sim j} \mathbf{W}[i,j] (\mathbf{x}[i] - \mathbf{x}[j])^2.$$
Note that we are normalizing by the norm of the signal, so that its energy does not influence our computation.
signal.T @ G.L @ signal
signal.T @ G.L @ signal/ np.linalg.norm(signal)**2
Let's compare it with the partitioning function: $$ x[i] = \begin{cases} -1 &\text{if } i \in S_1, \\ 0 &\text{if } i \in S_2, \\ 1 &\text{if } i \in S_3, \end{cases} $$ where $S_i$ is the set of nodes in partition $i$.
x = np.zeros(G.N)
x[:communities[0]] = -1 * np.ones(communities[0])
x[-communities[-1]:] = 1 * np.ones(communities[-1])
G.plot_signal(x)
x.T @ G.L @ x / np.linalg.norm(x)**2
NOTE the later normalized smoothness is a lot smooth than randomly generated signals
# G.compute_fourier_basis()
# eig_val, U = G.e, G.U
eig_val, U = sp.sparse.linalg.eigsh(G.L, k=10, which='SM')
plt.plot(eig_val[:10], '+-')
G.plot_signal(U[:,0])
G.plot_signal(U[:,1])
G.plot_signal(U[:,2])
On a graph that already have signals intergrated to it, we cannot use the eigen gap theorem on this. Eigen map only applies with raw graph
# Swiss Roll
Axes3D
n_points = 1000
X, color = datasets.make_swiss_roll(n_points, random_state=0)
fig = plt.figure(figsize=(15, 10))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=color, cmap=plt.cm.Spectral)
ax.view_init(4, -72)
G = graphs.Sensor(seed=42)
G.compute_fourier_basis()
taus = [0, 10, 50]
fig, axes = plt.subplots(len(taus), 2, figsize=(17, 9))
x0 = np.random.RandomState(1).normal(size=G.N)
for i, tau in enumerate(taus):
g = filters.Heat(G, tau)
x = g.filter(x0).squeeze()
x_hat = G.gft(x).squeeze()
G.plot_signal(x, ax=axes[i, 0])
axes[i, 0].set_axis_off()
axes[i, 0].set_title('')
axes[i, 0].text(0, -0.2, '$x^T L x = {:.2f}$'.format(x.T @ G.L @ x))
axes[i, 1].plot(G.e, np.abs(x_hat), '.-')
axes[0, 0].set_title(r'$x$: signal in the vertex domain')
axes[0, 1].set_title(r'$\hat{x}$: signal in the spectral domain')
axes[-1, 1].set_xlabel("laplacian's eigenvalues / graph frequencies");
G = graphs.Minnesota(365)
G.compute_fourier_basis()
fig, axes = plt.subplots(1, 2)
G.plot_signal(G.U[:, 7], ax=axes[0])
G.set_coordinates('line1D')
G.plot_signal(G.U[:, 1:7], ax=axes[1])
axes[1].legend(['u_{}'.format(i) for i in range(1, 7)])
G.set_coordinates('line1D')
G.plot_signal(G.U[:, 1:7])
plt.legend(['u_{}'.format(i) for i in range(1, 7)])
G = graphs.Sensor(365)
G.compute_fourier_basis()
fig, axes = plt.subplots(1, 2)
G.plot_signal(G.U[:, 7], ax=axes[0])
G.set_coordinates('line1D')
G.plot_signal(G.U[:, 1:7], ax=axes[1])
axes[1].legend(['u_{}'.format(i) for i in range(1, 7)])
G.set_coordinates('line1D')
G.plot_signal(G.U[:, 1:5])
plt.legend(['u_{}'.format(i) for i in range(1, 5)])